Statistical Analysis for Thesaurus Construction using an Encyclopedic Corpus
نویسندگان
چکیده
Conclusion Discrimination for the hierarchical relation of a word pair using an encyclopedic corpus called the Cyclone corpus In order not to miss an indirect relationship, a semantic expansion technique for descriptions is used The proposed method is able to detect 66.1% of relations Future work Discrimination between hierarchical and synonymous relation PREVIOUS WORK To extract hyponyms, synonyms, and hypernyms, Sentences that have specific syntactic patterns ( “a part of” “is a” “such as” ) (Marti, 1992; Tsurumaru, 1991) Descriptions in a dictionary (Suzuki, 2003) Specific document structure (Shinzato, 2004) are used
منابع مشابه
Statistical Thesaurus Construction for a Morphologically Rich Language
Corpus-based thesaurus construction for Morphologically Rich Languages (MRL) is a complex task, due to the morphological variability of MRL. In this paper we explore alternative term representations, complemented by clustering of morphological variants. We introduce a generic algorithmic scheme for thesaurus construction in MRL, and demonstrate the empirical benefit of our methodology for a Heb...
متن کاملWorking on a botanic corpus
Extracting information from an encyclopedic corpus of botanic may be done by hand but it is a long and tedious work. More and more, it becomes interesting and possible to speed-up the process by automatizing it but still keeping an human expert for validation. Among the different kind of information that may be extracted from a botanic corpus, we can cite terminology, conceptual information to ...
متن کاملA Method of Automatic Hypertext Construction from an Encyclopedic Dictionary of a Specific Field
1 Introduction Nowadays, very large volume of texts are created and stored in computer, and as a result the retrieval of texts which fits to a user's demand has become a difficult problem. Hypertext is a typical system to answer this problem , whose primary objective is to establish flexible as-sociative links between relevant text parts and to allow users to select and trace links to see relev...
متن کاملKnowledge Acquisition: Classification of Terms in a Thesaurus from a Corpus
! Faced with growing volume and accessibility of electronic textual information, information retrieval, and, in general, automatic documentation require updated terminological resources that are ever more voluminous. A current problem is the automated construction of these resources (e.g., terminologies, thesauri, glossaries, etd~ ~) from a corpus. Various linguistic and statistical methods to ...
متن کاملAutomatic Thai Ontology Construction and Maintenance System
Ontology is an essential resource to enhance the performance of Information Processing system such as information integration, document classification in taxonomies, including information retrieval and data cleaning in database system. This paper proposes three methodologies for Automatic Thai Ontology Construction and Maintenance from technical corpus, dictionary and thesaurus. For corpus base...
متن کامل